Data Fusion with Record Linkage
نویسنده
چکیده
Assuming that there are two sources (e.g. les), which consist of records with diierent informations about some units like people. We want to fusion the information (data) that belong to the same units. Very often in practice no identiication numbers | like the Social Security Number SSN | are available at both les, that's why there is some uncertainity, which records belong together. Anyway, we want to link the records of the sources together, hopefully the right ones. Record Linkage | based on the Likelihood-Ratio-Test | is one method, to link records in an eecient way, at most automatically, without a high amount of review. Thanks to Fellegi and Sunter (1969) we present the basics of Record Linkage they introduced at rst therein. Further on we discuss, how to use Record Linkage in practice.
منابع مشابه
Probabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملData Preparation for Biomedical Knowledge Domain Visualization: A Probabilistic Record Linkage and Information Fusion Approach to Citation Data
Data Preparation for Biomedical Knowledge Domain Visualization: A Probabilistic Record Linkage and Information Fusion Approach to Citation Data Marie B Synnestvedt Xia Lin Ph.D. This thesis presents a methodology of data preparation with probabilistic record linkage and information fusion for improving and enriching information visualizations of biomedical citation data. The problem of record l...
متن کاملEnriching Knowledge Domain Visualizations: Analysis of a Record Linkage and Information Fusion Approach to Citation Data
This article presents a study of the use of data preparation for data mining methodology to prepare biomedical citation data for visualization. Deterministic record linkage models were compared with probabilistic record linkage in a situation for which the truth is known through the use of gold standard or truth datasets. The linkages are evaluated on data from the Web of Science (WOS) and Medl...
متن کاملImproving record linkage with supervised learning for disclosure risk assessment
In data privacy, record linkage can be used as an estimator of the disclosure risk of protected data. To model the worst case scenario one normally attempts to link records from the original data to the protected data. In this paper we introduce a parametrization of record linkage in terms of a weighted mean and its weights, and provide a supervised learning method to determine the optimum weig...
متن کاملSupervised learning using a symmetric bilinear form for record linkage
Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998